Build a RAG based question answer solution using Amazon Bedrock Knowledge Base, Amazon OpenSearch Service Serverless with vector search and LangChain
Amit Arora
One of the most common applications of generative AI and large language models (LLMs) in an enterprise environment is answering questions based on the enterprise’s knowledge corpus. Amazon Lex provides the framework for building AI based chatbots. Pre-trained foundation models (FMs) perform well at natural language understanding (NLU) tasks such summarization, text generation and question answering on a broad variety of topics but either struggle to provide accurate (without hallucinations) answers or completely fail at answering questions about content that they haven’t seen as part of their training data. Furthermore, FMs are trained with a point in time snapshot of data and have no inherent ability to access fresh data at inference time; without this ability they might provide responses that are potentially incorrect or inadequate.
A commonly used approach to address this problem is to use a technique called Retrieval Augmented Generation (RAG). In the RAG-based approach we convert the user question into vector embeddings using an LLM and then do a similarity search for these embeddings in a pre-populated vector database holding the embeddings for the enterprise knowledge corpus. A small number of similar documents (typically three) is added as context along with the user question to the “prompt” provided to another LLM and then that LLM generates an answer to the user question using information provided as context in the prompt. RAG models were introduced by Lewis et al. in 2020 as a model where parametric memory is a pre-trained seq2seq model and the non-parametric memory is a dense vector index of Wikipedia, accessed with a pre-trained neural retriever. To understand the overall structure of a RAG-based approach, refer to Build a powerful question answering bot with Amazon SageMaker, Amazon OpenSearch Service, Streamlit, and LangChain.
In this post we provide a step-by-step guide with all the building blocks for creating an enterprise ready RAG application such as a question answering solution. We use LLMs available through Amazon Bedrock for the embeddings model (Amazon Titan Text Embeddings v2), the text generation model (Anthropic Claude v2) and the Amazon Bedrock Knowledge Base for this solution. The text corpus representing an enterprise knowledge base is stored as HTML files in Amazon S3 and is ingested in the form of text embeddings into an index in a Amazon OpenSearch Service Serverless (AOSS) collection using Bedrock knowledge base agent in a fully-managed serverless fashion.
We provide an AWS Cloud Formation template to stand up all the resources required for building this solution. We then demonstrate how to use LangChain to interface with the Bedrock and opensearch-py to interface with AOSS and build a RAG based question answer workflow.
Solution overview
We use a subset of SageMaker docs as the knowledge corpus for this post. The data is available in the form of HTML files in an S3 bucket, a Bedrock Knowledge Base Agent then reads these files, converts them into smaller chunks, encodes these chunks into vectors (embeddings) and then ingests these embeddings into an AOSS collection index. We implement the RAG functionality in a notebook, a set of SageMaker related questions is asked of the Claude model without providing any additional context and then the same questions are asked again but this time with context based on similar documents retrieved from AOSS i.e. using the RAG approach. We demonstrate the responses generated without RAG could be factually inaccurate whereas the RAG based responses are accurate and more useful.
All the code for this post is available in the GitHub repo.
The following figure represents the high-level architecture of the proposed solution.
Step-by-step explanation:
- The user provides a question via the Jupyter notebook.
- The question is converted into embedding using Bedrock via the Titan embeddings v2 model.
- The embedding is used to find similar documents from an AOSS index.
- The similar documents long with the user question are used to create a “prompt”.
- The prompt is provided to Bedrock to generate a response using the Claude v2 model.
- The response along with the context is printed out in a notebook cell.
As illustrated in the architecture diagram, we use the following AWS services:
- Bedrock for access to the LLMs for embedding and text generation as well as for the knowledge base agent.
- OpenSearch Service Serverless with vector search for storing the embeddings of the enterprise knowledge corpus and doing similarity search with user questions.
- S3 for storing the raw knowledge corpus data (HTML files).
- AWS Identity and Access Management roles and policies for access management.
- AWS CloudFormation for creating the entire solution stack through infrastructure as code.
In terms of open-source packages used in this solution, we use LangChain for interfacing with Bedrock and opensearch-py to interface with AOSS.
The workflow for instantiating the solution presented in this post in your own AWS account is as follows:
Run the CloudFormation template provided with this post in your account. This will create all the necessary infrastructure resources needed for this solution:
- AOSS collection
- SageMaker Notebook
- IAM roles
Create a vector index in the AOSS collection. This is done through the AOSS console.
Create a knowledge base in Bedrock and synch data from the S3 bucket to the AOSS index. This is done through the Bedrock console.
Run the
rag_w_bedrock_and_aoss.ipynbnotebook in the SageMaker notebook to ask questions based on the data ingested in AOSS index.
These steps are discussed in detail in the following sections.
Prerequisites
To implement the solution provided in this post, you should have an AWS account and awareness about LLMs, OpenSearch Service and Bedrock.
Use AWS Cloud Formation to create the solution stack
Choose Launch Stack for the Region you want to deploy resources to. All parameters needed by the CloudFormation template have default values already filled in, except for ARN of the IAM role with which you are currently logged into your AWS account which you’d have to provide. Make a note of the OpenSearch Service collection ARN, we use this in subsequent steps. This template takes about 5 minutes to complete.
| AWS Region | Link |
|---|---|
| us-east-1 (N. Virginia) | |
| us-west-2 (Oregon) |
After the stack is created successfully, navigate to the stack’s Outputs tab on the AWS CloudFormation console and note the values for CollectionARN and AOSSVectorIndexName. We use those in the subsequent steps.
Create an AOSS vector index
The CloudFormation stack creates an AOSS collection, the next step is to create a vector index. This is done through the AOSS console as described below.
Navigate to OpenSearch Service console and click on
Collections. Thesagemaker-kbcollection created by the CloudFormation stack will be listed there.Figure 3: SageMaker Knowledge Base Collection Click on the
sagemaker-kblink to create a vector index for storing the embeddings from the documents in S3.Figure 4: SageMaker Knowledge Base Vector Index Set the vector index name as
sagemaker-readthedocs-io, vector field name asvectordimensions as1536, and distance metric asEuclidean. It is required that you set these parameters exactly as mentioned here because the Bedrock Knowledge Base Agent is going to use these same values.Figure 5: SageMaker Knowledge Base Vector Index Parameters Once created the vector index is listed as part of the collection.
Figure 6: SageMaker Knowledge Base Vector Index Created
Create a Bedrock knowledge base
Once the AOSS collection and vector index have been created, it is time to setup the Bedrock knowledge base.
Navigate to the Bedrock Console and click on
Knowledge Baseand click on theCreated Knowledge Basebutton.Figure 7: Bedrock Knowledge Base Fill out the details for creating the knowledge base as shown in the screenshots below.
Figure 8: Bedrock Knowledge Base Select the S3 bucket.
Figure 9: Bedrock Knowledge Base S3 bucket The Titan embeddings model is automatically selected.
Figure 10: Bedrock Knowledge Base embeddings model Select Amazon OpenSearch Service Serverless from the vector database options available.
Figure 11: Bedrock Knowledge Base AOSS Review and create the knowledge base by clicking the
Create knowledge basebutton.Figure 12: Bedrock Knowledge Base Review & Create The knowledge base should be created now.
Figure 13: Bedrock Knowledge Base create complete
Sync the Bedrock knowledge base
Once the Bedrock knowledge base is created we are now ready to sync the data (raw documents) in S3 to embeddings in the AOSS collection vector index.
Start the
Syncby pressing theSyncbutton, the button label changes toSyncing.Figure 14: Bedrock Knowledge Base sync Once the
Synccompletes the status changes toReady.Figure 15: Bedrock Knowledge Base sync completed
Run the RAG notebook
Now we are all set to ask some questions off our newly created knowledge base. The CloudFormation template creates a SageMaker Notebook that contains the code to demonstrate this.
Navigate to SageMaker Notebooks and find the notebook named
rag-w-bedrock-kb-notebookand click onOpen Jupyter Lab.Figure 16: RAG with Bedrock KB notebook When the Jupyter lab opens, click on the
rag_w_bedrock_and_aoss.ipynbto open the notebook.The notebook code demonstrates use of Bedrock, LangChain and opensearch-py packages for implementing the RAG technique for question answering.
We access the models available via Bedrock using the
BedrockandBedrockEmbeddingsclasses from the LangChain package.# we will use Anthropic Claude for text generation claude_llm = Bedrock(model_id= "anthropic.claude-v2") claude_llm.model_kwargs = dict(temperature=0.5, max_tokens_to_sample=300, top_k=250, top_p=1, stop_sequences=[]) # we will be using the Titan Embeddings Model to generate our Embeddings. embeddings = BedrockEmbeddings(model_id = "amazon.titan-embed-g1-text-02")Interface to AOSS is through the opensearch-py package.
# Functions to talk to OpenSearch # Define queries for OpenSearch def query_docs(query: str, embeddings: BedrockEmbeddings, aoss_client: OpenSearch, index: str, k: int = 3) -> Dict: """ Convert the query into embedding and then find similar documents from AOSS """ # embedding query_embedding = embeddings.embed_query(query) # query to lookup OpenSearch kNN vector. Can add any metadata fields based filtering # here as part of this query. query_qna = { "size": k, "query": { "knn": { "vector": { "vector": query_embedding, "k": k } } } } # OpenSearch API call relevant_documents = aoss_client.search( body = query_qna, index = index ) return relevant_documentsWe combine the prompt and the documents retrieved from AOSS as follows.
def create_context_for_query(q: str, embeddings: BedrockEmbeddings, aoss_client: OpenSearch, vector_index: str) -> str: """ Create a context out of the similar docs retrieved from the vector database by concatenating the text from the similar documents. """ print(f"query -> {q}") aoss_response = query_docs(q, embeddings, aoss_client, vector_index) context = "" for r in aoss_response['hits']['hits']: s = r['_source'] print(f"{s['metadata']}\n{s['text']}") context += f"{s['text']}\n" print("----------------") return contextCombining everything, the RAG workflow works as shown below.
# 1. Start with the query q = "What versions of XGBoost are supported by Amazon SageMaker?" # 2. Create the context by finding similar documents from the knowledge base context = create_context_for_query(q, embeddings, client, aoss_vector_index) # 3. Now create a prompt by combining the query and the context prompt = PROMPT_TEMPLATE.format(context, q) # 4. Provide the prompt to the LLM to generate an answer to the query based on context provided response = claude_llm(prompt)Here is an example of a sample question answered first with just the question in the prompt i.e. without providing any additional context. The answer without context is inaccurate.
Figure 17: Answer with prompt alone We then ask the same question but this time with the additional context retrieved from the knowledge base included in the prompt. Now the inaccuracy in the earlier response is addressed and we also have attribution as to the source of this answer (notice the underlined text for the filename and the actual answer)!
Figure 18: Answer with prompt and context
Clean up
To avoid incurring future charges, delete the resources. You can do this by first deleting all the files from the S3 bucket created by the CloudFormation template and then deleting the CloudFormation stack.
Conclusion
In this post, we showed how to create an enterprise ready RAG solution using a combination of AWS services and open-source Python packages.
We encourage you to learn more by exploring Amazon Titan models, Amazon Bedrock, and OpenSearch Service and building a solution using the sample implementation provided in this post and a dataset relevant to your business. If you have questions or suggestions, leave a comment.